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Abstract 

Bayesian methods and their implementations by means of sophisticated Monte Carlo techniques, 
such as Markov chain Monte Carlo (MCMC) and particle filters, have become very popular in signal 
processing over the last years. However, in many problems of practical interest these techniques demand 
procedures for sampling from probability distributions with non-standard forms, hence we are often 
brought back to the consideration of fundamental simulation algorithms, such as rejection sampling 
(RS). Unfortunately, the use of RS techniques demands the calculation of tight upper bounds for the 
ratio of the target probability density function (pdf) over the proposal density from which candidate 
samples are drawn. Except for the class of log-concave target pdf's, for which an efficient algorithm 
exists, there are no general methods to analytically determine this bound, which has to be derived 
from scratch for each specific case. In this paper, we introduce new schemes for (a) obtaining upper 
bounds for likelihood functions and (b) adaptively computing proposal densities that approximate the 
target pdf closely. The former class of methods provides the tools to easily sample from a posteriori 
probability distributions (that appear very often in signal processing problems) by drawing candidates 
from the prior distribution. However, they are even more useful when they are exploited to derive the 
generalized adaptive RS (GARS) algorithm introduced in the second part of the paper. The proposed 
GARS method yields a sequence of proposal densities that converge towards the target pdf and enable 
a very efficient sampling of a broad class of probability distributions, possibly with multiple modes and 
non-standard forms. We provide some simple numerical examples to illustrate the use of the proposed 
techniques, including an example of target localization using range measurements, often encountered in 
sensor network applications. 

Index Terms 

Rejection sampling; adaptive rejection sampling; Gibbs sampling; particle filtering; Monte Carlo 
integration; sensor networks; target localization. 
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I. Introduction 

Bayesian methods have become very popular in signal processing during the past decades and, with 
them, there has been a surge of interest in the Monte Carlo techniques that are often necessary for the 
implementation of optimal a posteriori estimators ||6l|4l[T3l[l2]|. Indeed, Monte Carlo statistical methods 
are powerful tools for numerical inference and optimization fT3l. Currently, there exist several classes 
of MC techniques, including the popular Markov Chain Monte Carlo (MCMC) Ol and particle 
filtering ^ |4]| families of algorithms, which enjoy numerous applications. However, in many problems 
of practical interest these techniques demand procedures for sampling from probability distributions with 
non-standard forms, hence we are often brought back to the consideration of fundamental simulation 
algorithms, such as importance sampling 0, inversion procedures [il3J and the accept/reject method, 
also known as rejection sampling (RS). 

The RS approach |13, Chapter 2] is a classical Monte Carlo technique for "universal sampling". It 
can be used to generate samples from a target probability density function (pdf) by drawing from a 
possibly simpler proposal density. The sample is either accepted or rejected by an adequate test of the 
ratio of the two pdf's, and it can be proved that accepted samples are actually distributed according to 
the target density. RS can be applied as a tool by itself, in problems where the goal is to approximate 
integrals with respect to (w.r.t.) the pdf of interest, but more often it is a useful building block for more 
sophisticated Monte Carlo procedures |[8l [151 US- An important limitation of RS methods is the need to 
analytically establish a bound for the ratio of the target and proposal densities, since there is a lack of 
general methods for the computation of exact bounds. 

One exception is the so-called adaptive rejection sampling (ARS) method |[8l [71 [131 which, given a 
target density, provides a procedure to obtain both a suitable proposal pdf (easy to draw from) and the 
upper bound for the ratio of the target density over this proposal. Unfortunately, this procedure is only 
valid when the target pdf is strictly log-concave, which is not the case in most practical cases. Although 
an extension has been proposed ||9l [3 that enables the application of the ARS algorithm with T-concave 
distributions (where T is a monotonically increasing function, not necessarily the logarithm), it does not 
address the main limitations of the original method (e.g., the impossibility to draw from multimodal 
distributions) and is hard to apply, due to the difficulty to find adequate T transformations other than the 
logarithm. Another algorithm, called adaptive rejection metropolis sampling (ARMS) [14J, is an attempt 
to extend the ARS to multimodal densities by adding Metropolis-Hastings steps. However, the use of an 
MCMC procedure has two important consequences. First, the resulting samples are correlated (unlike in 
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the original ARS method), and, second, for multimodal distributions the Markov Chain often tends to 
get trapped in a single mode. 

In this paper we propose general procedures to apply RS when the target pdf is the posterior density 
of a signal of interest (Sol) given a collection of observations. Unlike the ARS technique, our methods 
can handle target pdf's with several modes (hence non-log-concave) and, unlike the ARMS algorithm, 
they do not involve MCMC steps. Hence, the resulting samples are independent and come exactly from 
the target pdf. 

We first tackle the problem of computing an upper bound for the likelihood of the Sol given fixed 
observations. The proposed solutions, that include both closed-form bounds and iterative procedures, are 
useful when we draw the candidate samples from the prior pdf. 

In this second part of the paper, we extend our approach to devise a generalization of the ARS method 
that can be applied to a broad class of pdf's, possibly multimodal. The generalized algorithm yields an 
efficient proposal density, tailored to the target density, that can attain a much better acceptance rate 
than the prior distribution. We remark that accepted samples from the target pdf are independent and 
identically distributed (i.i.d). 

The remaining of the paper is organized as follows. We formally describe the signal model in Section 
[nl Some useful definitions and basic assumptions are introduced in Section [lll| In Section |IV| we propose 



a general procedure to compute upper bounds for a large family of likelihood functions. The ARS method 
is briefly reviewed in Section |V} while the main contribution of the paper, the generalization of the ARS 



algorithm, is introduced in Section VI Section VII is devoted to simple numerical examples and we 



conclude with a brief summary in Section VIII 



II. Model and Problem Statement 

A. Notation 

Scalar magnitudes are denoted using regular face letters, e.g., x, X, while vectors are displayed as 
bold-face letters, e.g., x, X. We indicate random variates with upper-case letters, e.g., X, X, while we 
use lower-case letters to denote the corresponding realizations, e.g., x, x. We use letter p to denote the 
true probability density function (pdf) of a random variable or vector. This is an argument- wise notation, 
common in Bayesian analysis. For two random variables X and Y, p{x) is the true pdf of X and p{y) 
is the true pdf of Y, possibly different. The conditional pdf of X given Y = y is written p{x\y). Sets 
are denoted with calligraphic upper-case letters, e.g., 71. 
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B. Signal Model 

Many problems in science and engineering involve the estimation of an unobserved Sol, x G M^, from 
a sequence of related observations. We assume an arbitrary prior probability density function (pdf) for 
the Sol, X ^ p(x), and consider n scalar random observations, G M, i = 1, . . . , n, which are obtained 
through nonlinear transformations of the signal X contaminated with additive noise. Formally, we write 

Yi = ^i(X) + Gi, . . . = gn{X) + (1) 

where Y = [Yi, . . . , YnY ^ is the random observation vector, gi : ^ M, i = 1, . . . , n, are 
nonlinearities and 6^ are independent noise variables, possibly with different distributions for each i. We 
write y = [yi, . . . , ^^]^ G for the vector of available observations, i.e., a realization of Y. 
We assume exponential-type noise pdf's, of the form 

Q^r^p{^i)^he^v{-V^{^i)]. (2) 

where A:^ > is real constant and Vi^di) is a function, subsequently referred to as marginal potential, 
with the following properties: 

(PI) It is real and non negative, i.e., : R ^ [0, +oc). 

(P2) It is increasing > 0) for > and decreasing < 0) for < 0. 

These conditions imply that Vi{'di) has a unique minimum at i9* = and, as a consequence p{'di) 
has only one maximum (mode) at i9* = 0. Since the noise variables are independent, the joint pdf 
^2, • • • , ^n) = niLi^(^^) i^ ^^^y construct and we can define a joint potential function 

n 

. . . , I?,) 4 - log . . . , ^n)] = - E (3) 

i=l 

Substituting ^ into ([3]) yields 

n 

yW(i9i,...,i9,) = c, + 5^y,(i9,) (4) 

i=l 

where = — X^ILi is a constant. In subsequent sections we will be interested in a particular class 
of joint potential functions denoted as 

n 

y/")(i9i, . . . , i9,) = 5^ < / < +OC, (5) 

i=l 

where the subscript / identifies the specific member of the class. In particular, the function obtained for 
/ = 2, V2^\i}i, . . . , i}n) = J27=i l^^l^ i^ termed quadratic potential. 



5 



Let g = [gi^ . . . ^ Qn]^ be the vector- valued nonlinearity defined as g(x) = [^i(x), . . . ^gn{x)]^ . The 
scalar observations are conditionally independent given a realization of the Sol, X = x, hence the 
likelihood function £(x;y, g) ^ p{y\x), can be factorized as 

n 

^(x;y,g) = f]p(y,|x), (6) 

i=l 

where p{yi\x) = kiexp[—Vi{yi — ^^(x))}. The likelihood in ^ induces a system potential function 
V{x; y, g) : ^ [0, +00), defined as 



y(x;y,g) ^ -log[^(x;y,g)] = - log[p(y,|x)], (7) 

i=l 

that depends on x, the observations y, and the function g. Using (|4]) and (|7]), we can write the system 
potential in terms of the joint potential, 

n 

y(x; y, g) = + ^ V,{y, - g,{x)) = y(^) {y, - ^i(x), . . . , - ^,(x)). (8) 



C. Rejection Sampling 

Assume that we wish to approximate, by sampling, some integral of the form /(/) = f{x)p(x\y)dx, 
where / is some measurable function of x and p{x\y) oc p{x)i{x; y, g) is the posterior pdf of the Sol given 
the observations. Unfortunately, it may not be possible in general to draw directly from p{x\y), so we 
need to apply simulation techniques to generate adequate samples. One appealing possibility is to perform 
RS using the prior, p{x), as a proposal function. In such case, let 7 be a lower bound for the system 
potential, 7 < V{x; y, g), so that L = exp{— 7} is an upper bound for the likelihood, ^(x; y, g) < L. We 
can generate N samples according to the standard RS algorithm. 

1) Set i = 1. 

2) Draw samples x^ from p{x) and from U{0, 1), where U{0, 1) is the uniform pdf in [0, 1]. 
^) ^ ^(^^ > then x(^) = x', else discard x' and go back to step 2. 

4) Set z = z + l. Ifz>A^ then stop, else go back to step 2. 
Then, /(/) can be approximated as /(/) ^ /(/) = j^J^iLifi^^^'^)- The fundamental figure of merit 
of a rejection sampler is the acceptance rate, i.e., the mean number of accepted samples over the total 
number of proposed candidates. 



In Section [IV} we address the problem of analytically calculating the bound L = exp{— 7}. Note 
that, since the log function is monotonous, it is equivalent to maximize £ w.r.t. x and to minimize the 
system potential V also w.r.t. x. As a consequence, we may focus on the calculation of a lower bound 
7 for V{x; y, g). Note that this problem is far from trivial. Even for very simple marginal potentials, t^. 
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z = 1, n, the system potential can be highly multimodal w.r.t. x. See the example in the Section fVII-A 
for an illustration. 

III. Definitions and Assumptions 

Hereafter, we restrict our attention to the case of a scalar Sol, x G M. This is done for the sake of 
clarity, since dealing with the general case x G requires additional definitions and notations. The 
techniques to be described in Sections |IV||VT| can be extended to the general case, although this extension 



is not trivial. The example in Section |VII-C| illustrates how the proposal methodology is also useful in 
higher dimensional spaces, though. 

For a given vector of observations Y = y, we define the set of simple estimates of the Sol as 

X ^ {xi eR: yi = gi{xi) for z = 1, . . . , n} . (9) 

Each equation yi = gi{xi), in general, can yield zero, one or more simple estimates. We also introduce 
the maximum likelihood (ML) Sol estimator X, as 

X E argmax^(x|y, g) = argminy(x;y, g), (10) 

xeR xeR 

not necessarily unique. 

Let us use ^ C R to denote the support of the vector function g, i.e., g : ^ C R ^ R^. We assume 
that there exists a partition {Bj}^j^^ of A (i.e., A = U^j^-^Bj and Bi D Bj = 0, Vz ^ j) such that the 
subsets Bj are intervals in R and we can define functions gij : 0j ^ R, j = 1, . . . , g and z = 1, . . . , n, 
as 

9ij{x) - 9i{x), Vx G Bj, (11) 

i.e., Qi^j is the restriction of gi to the interval Bj. We further assume that (a) every function gi^j is invertible 

in Bj and (b) every function gi^j is either convex in Bj or concave in Bj. Assumptions (a) and (b) together 

mean that, for every i and all x e Bj, the first derivative is either strictly positive or strictly negative 

and the second derivative is either non-negative or non-positive. As a consequence, there are 

exactly n simple estimates (one per observation) in each subset of the partition, namely Xij = gj^jiyi) 

for z = 1, . . . , n. We write the set of simple estimates in Bj as Xj = {xij, . . . , Xnj}. Due to the additivity 

of the noise in (1), if gij is bounded there may be a non-negligible probability that Yi > max gij{x) 
J ' xe[Bj] 

(or Yi < min gi j{x)), where [Bj] denotes the closure of set Bj, hence g^Uyi) may not exist for 

xe[Bj] ' '-^ 

some realization Yi = yi. In such case, we define Xij = arg max gij{x) (or Xij = arg min gij{x), 

xe[Bj] ' ' xe[Bj] 

respectively), and admit Xij = +oc (respectively, Xij = — oc) as valid simple estimates. 



7 



IV. Computation of upper bounds on the likelihood 

A. Basic method 

Let y be an arbitrary but fixed realization of the observation vector Y. Our goal is to obtain an 
analytical method for the computation of a scalar 7(y) G M such that 7(y) < inf Vix^y^g). Hereafter, 
we omit the dependence on the observation vector and write simply 7. The main difficulty to carry out 
this calculation is the nonlinearity g, which renders the problem not directly tractable. To circumvent this 
obstacle, we split the problem into q subproblems and address the computation of bounds for each set 
j = 1, . . . , in the partition of A. Within Bj, we build adequate linear functions {ri^j}^^^ in order 
to replace the nonlinearities {^ijl^^^. We require that, for every r^^, the inequalities 

\yi - ri^j{x)\ < \yi - gij{x) \ , and (12) 

iVi - rij{x)){yi - gij{x)) > (13) 

hold jointly for alH = 1, . . . , n, and all x e Ij C Bj, where Ij is any closed interval in Bj such that 

Xj G arg min V^(x;y, g) (i.e., any ML estimator of the Sol X restricted to Bj, possibly non unique) is 

xe[Bj] 

contained in Ij. The latter requirement can be fulfilled if we choose Ij = [min(Xj) ^ mSix(Xj)] (see the 

Appendix for a proof). 

If ( [T2I ) and ( [T3] ) hold, we can write 

Viivi - rij{x)) < Vi{yi - gij{x)), Vx G Ij, (14) 

which follows easily from the properties (PI) and (P2) of the marginal potential functions Vi as 



described in Section 



II-B 



Moreover, since y(x;y,g-) = + Y.'i=i ^iiVi - 9ij{x)) and V{x;y,rj 



Cn + X^iLi^(^^ ~ ^ij(^)) (this function will be subsequently referred as the modified system 
potential) where gj{x) = [^ij(x), . . . ,^^j(x)] and rj{x) = [rij(x), . . . ,r^j(x)], Eq. (14) implies that 
l/(x;y, Fj) < V{x;y,gj), Vx G Ij, and, as a consequence, 

7j = inf V{x;y,rj) < inf V{x;y,gj) = inf V{x;y,g). (15) 

x^Zj xEZj x^Bj 

Therefore, it is possible to find a lower bound in Bj for the system potential V{x; y, g^), denoted 7^, by 
minimizing the modified potential y(x;y, r^) in Ij. 

All that remains is to actually build the linearities {rij}^^^. This construction is straightforward and 
can be described graphically by splitting the problem into two cases. Case 1 corresponds to nonlinearities 
gij such that ^^''^^^^ x ^ ^^x^^^ — ^ (^•^•' ^^^3 either increasing and convex or decreasing and concave). 
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while case 2 corresponds to functions that comply with ^^'^'^^^^ x ^ ^^^2^"^ < (i.e., gij is either increasing 
and concave or decreasing and convex), when x e Bj. 

Figure [T](a)-(b) depicts the construction of r^j in case 1. We choose a linear function j that connects 
the point (min (A'j), ^(min (A'j))) and the point corresponding to the simple estimate, {xij^ g{xij)). In 
the figure, dr and dg denote the distances \yi — rij(x)\ and \yi — gij{x)\, respectively. It is apparent that 



dr < dg for all x G Xj, hence inequality (12) is granted. Inequality ( [13] ) also holds for all x E Xj, since 
rij{x) and gij{x) are either simultaneously greater than (or equal to) yi, or simultaneously lesser than 
(or equal to) yi. 

Figure [T](c)-(d) depicts the construction of r^j in case 2. We choose a linear function j that connects 
the point (max (A'j), ^(max (A'j))) and the point corresponding to the simple estimate, (xij^ g{xij)). 
Again, dr and dg denote the distances \yi — rij{x)\ and \yi — gij{x)\, respectively. It is apparent from 
the two plots that inequalities (12) and (13) hold for all x G Xj. 



A special subcase of 1 (respectively, of 2) occurs when Xi = min (Xj) (respectively, Xij = max (Xj)). 
Then, rij{x) is the tangent to gij{x) at Xij. If Xij = =boc then rij{x) is a horizontal asymptote of 

It is often possible to find 7j = inf y(x;y, r^) < inf V^(x;y, g ) in closed-form. If we choose 
7 = min7j, then 7 < inf V^(x,y, g) is a global lower bound of the system potential. Table [l| shows an 
outline of the proposed method, that will be subsequently referred to as bounding method 1 (BMl) for 



conciseness. 



TABLE I 
Bounding Method 1. 



1. Find a partition {BjY-^-^ of the space of the Sol. 

2. Compute the simple estimates Xj = . . . , Xn,j} for each Bj. 

3. Calculate Xj = [min(A'j), max(A'j)] and build rij{x), for x G Xj and i = 1, . . . , n. 

4. Replace gj{x) with rj(x), and minimize V{x;y,rj) to find the lower bound 7^. 

5. Find 7 = min7j. 



B. Iterative Implementation 

The quality of the bound 7^ depends, to a large extent, on the length of the interval Xj, denoted 
This is clear if we think of rij{x) as a linear approximation on Ij of the nonlinearity gij{x). Since we 
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Fig. 1. Construction of the auxiliary linearities {rij}^^^. We indicate dr = \yi — rij(x)\ and dg = \yi — gij(x)\, respectively. 
It is apparent that dr < dg and rij{x) and gi,j{x) are either simultaneously greater than (or equal to) or simultaneously 
lesser than (or equal to) yi, for all x G X,. Hence, the inequalities |T2| and |T3] l are satisfied Vx G 2j. (a) Function gij is 
increasing and convex (case 1). (b) Function gij is decreasing and concave (case 1). (c) Function gij is decreasing and convex 
(case 2). (d) Function gij is increasing and concave (case 2). 



have assumed gij{x) is continuous and bounded in Xj, the procedure to build rij{x) in BMl imphes 
that 

lim \gij{x) - rij{x)\ < lim | sup gij{x) - inf gij{x) \ = 0, (16) 

for all X G Xj. Therefore, if we consider intervals Ij which are shorter and shorter, then the modified 
potential function V{x;y^rj) will be closer and closer to the true potential function V{x;y^gj), and 
hence the bound 7j < y(x;y, r^) < V{x;y,gj) will be tighter. 

The latter observation suggests a procedure to improve the bound 7j for a given interval Ij. Indeed, 
let us subdivide Ij into k subintervals denoted Iv,v+i — [sy^ Sy^i] where = 1, . . . , A: and 5^, 5^+1 G Ij. 
We refer to the elements in the collection Sj^k = {-^i, • • • , 5/e+i}, with 5i < 52 < . . . < s^^i, as support 
points in the interval Ij. We can build linear functions r^^^ = [r[^j, . . . , r^^j] for every subinterval Iv,v+i, 



using the procedure described in Section |IV-A| We recall that this procedure is graphically depicted in 
Fig. [1} where we simply need to 
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• substitute Xj by Tv,v+i and 

• when the simple estimate ^ X^^^+i, substitute Xij by 5^ (x^^ by 5^+1) if Xij < Sy (if 
Xij > Sy^i, respectively). 

Using r^^^ we compute a bound 7 -^^ v = 1, . . . , fc, and then select ^yj^k = i^iin T.-^^- Note that 
the subscript k in 7^ indicates how many support points have been used to computed the bound 
in Ij (which becomes tighter as k increases). Moreover if we take a new (arbitrary) support point 
5* from the subinterval Xy*^^*+i that contains 7^ ^^, and extend the set of support points with it, 
Sj^k+i = {<si, . . . , 5*, . . . , with 5i < 52 < . . . < 5* < . . . < 5/e+2. then we can iterate the 

proposed procedure and obtain a refined version of the bound, denoted 7j,fc+i- 

The proposed iterative algorithm is described, with detail, in Table [ll| Note that k is an iteration 
index that makes explicit the number of support points Sy. If we plug this iterative procedure for the 
computation of 7^ into BMl (specifically, replacing steps 3 and 4 of Table [l]), we obtain a new technique 
that we will hereafter term bounding method 2 (BM2). 

As an illustration. Figure [2] shows four steps of the iterative algorithm. In Figure [2] (a) there are two 
support points Sj^i = {min{Xj)^ max(A'j)}, which yield a single interval Ii^2 = ^j- In Figures [2] (b)- 
(c)-(d), we successively add a point 5* chosen in the interval that contains the latest bound. In 

this example, the point 5* is chosen deterministically as the mean of the extremes of the interval ly* 

TABLE II 

Iterative algorithm to improve 7^ . 



1. Start with Xi,2 = ^j, and Sj,i = {min (Afj), max {^j)}. Let v"" = 1 and /c = 1. 

2. Choose an arbitrary interior point s* inX^*,^*+i, and update the set of support points <Sj,fc = <Sj,fc-i U {s*}. 

3. Sort Sj^k in ascending order, so that Sj^k = {si, • • • , Sfc+i} where si = mm{Xj), Sk+i = (max Afj), 
and /c + 1 is the number of elements of Sj^k- 

4. Build r^j"\x) for each interval Xv,v-\-i = [sv, s^^+i] with ^; = 1, . . . , /c. 

5. Find 7^^^* = min V{x; y, r^-^'*), for ^; = 1, . . . , /c. 

6. Set the refined bound jj^k = min 7-^^ and set V:^. = argmin7 -^\ 

ve{i,...,k} V 

7. To iterate, go back to step 2. 
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Fig. 2. Four steps of the iterative algorithm choosing s* as the middle point of the subinterval The solid line shows 

the system potential V{x;y,g) = {yi — exp (x))^ — log (2/2 — exp (— x) + 1) + (2/2 — exp (— x)) + 1 (see the example in Section 
VII- A| ), with 2/1=5 and 1/2 = 2, while the dashed line shows the modified potential V{x; y, r^). We start in plot (a) with two 
points Sj,i — {min (Afj), max (Afj)}. At each iteration, we add a new point chosen in the subinterval Xv*,v*+i that contains 
the latest bound. It is apparent that V{x;y, rj) becomes a better approximation of V{x; y, g^) each time we add a new support 
point. 



C. Lower bound 72 for quadratic potentials 

Assume that the joint potential is quadratic, i.e., 1^2^^^ (^1 ~ 9ij{x)i • • • iVn — 9nj{x)) = ~ 
gi^j{x)Y for each j = 1, . . . , and construct the set of linearities rij{x) = aijx + bij, for z = 1, . . . , n 
and j = 1, . . . , The modified system potential in Bj becomes 

n n 

V2{x]y, Yj) = ^{Vi - rij{x)f = "^{Vi - dijx - bijf, (17) 
i=i i=i 

and it turns out straightforward to compute 72 j = min y, r^). Indeed, if we denote = 
[aij, . . . , anj]^ and Wj = [yi — . . . — bnj]'^ , then we can readily obtain 

Xj = argminy(x;y,rj) = -4- — , (18) 
xeBj aj Sij 

and 72 J = y(xj;y, Fj). It is apparent that 72 = min 72 j < V{x;y^g). Furthermore, xj is an 

j 

approximation of the ML estimator xj restricted to Bj. 

D. Adaptation of 72 for generic system potentials 

If the joint potential is not quadratic, in general it can still be difficult to minimize the modified function 
l/(x;y, r), despite the replacement of the nonlinearities gij{x) with the linear functions rij{x). In this 
section, we propose a method to transform the bound for a quadratic potential, 72, into a bound for some 
other, non-quadratic, potential function. 

Consider an arbitrary joint potential V^^"^ and assume the availability of an invertible increasing function 
R such that i?ol/(^) > V2^\ where o denotes the composition of functions. Then, for the system potential 
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we can write 



(i?ol/)(x;y,g) > V^''\yi - gi{x), . . . ,yn - gn{x)) 

n 



(19) 



and, as consequence, V{x] y, g) > i? ^ (72) = 7, hence 7 is a lower bound for the non-quadratic system 
potential V^(x;y, g) constructed from 

(n) 

For instance, consider the family of joint potentials Vp . Using the monotonicity of CP norms, it is 
possible to prove |[T6ll that 



V in 



E|t?,r > ' fo'" 0<P<2, and 



(20) 



(21) 



Let R\{v) — v^/^. Since this function is, indeed, strictly increasing, we can transform the inequality (20) 
into 

/ n \ n 

" " " (22) 



Rl[Y.\yi-g^{x)\^\ >J2(y-9iix)f, 

\i=l J i=l 

which yields 



p/2 



(23) 



i=l 



v/2 (n) 

hence the transformation 72 of the quadratic bound 72 is a lower bound for Vp ^ with < p < 2. 

v^l^ ) , the inequality J2l| yields 



Similarly, if we let i?2(^) = \ n\^'p ) 

n / n \ 

\yi - g^{x)r > Y.(yi - g,{x)f = 



i=l 



1/2 



> n(-^)^/^ 



(24) 



hence the transformation R2 ^(72) — n ^'^ ^)/^72^^ is a lower bound for Vp^^ when 2 < p < +oc. 

It is possible to devise a systematic procedure to find a suitable function R given an arbitrary joint 
potential y(^)(^), where ^ = Let us define the manifold 4 G : y(^)(i?) = v}. 

We can construct R by assigning i?(i;) with the maximum of the quadratic potential i9? when i9 E F^, 
i.e., we define 

n 

R{v)=maxS2^l (25) 



Note that (25 ) is a constrained optimization problem that can be solved using, e.g., Lagrangian multipliers. 



13 



From the definition in ([25]) we obtain that, Vi? G Ty, R{v) > Yli=i ^i- In particular, since V^''\i9) = v 
from the definition of T^, we obtain the desired relationship, 

n 

i?(y(")(t?i,...,t?„)) >E^^ (26) 

i=l 

We additionally need to check whether i? is a strictly increasing function of v. The two functions in the 
earlier examples of this Section, Ri and i?2, can be readily found using this method. 

E. Convex marginal potentials Vi 

Assume that A = {Sj}^^^ and that we have already found rij{x) = aijx + bij, i = 1, . . . , n and 



j = 1, . . . , using the technique in Section IV- A If a marginal potential Vi(^i) is convex, the function 
Vi{yi — rij{x)) is also convex in Bj. Indeed, for all x G Bj 



dx^ dx^ d'di \ dx ) d'dj ' d^l 

(Pr 

where we have used that = (since ri^j is linear). 

As a consequence, if all marginal potentials Vi{'di) are convex, then the modified system potential. 



V{x] y, Yj) = + Y17=i ^iiVi ~ ^1^0 convex in Bj. This is easily shown using (27), to obtain 



= E-^^>0' VxGS,. (28) 

i=l ^ 

Therefore, we can use the tangents to y(x;y, r^) at the limit points of Ij (i.e, min(A:'j) and mSix{Xj)) 
to find a lower bound for the system potential V{x;y,gj). Figure |3] (left) depicts a system potential 
V(x; y, g^) (solid line), the corresponding modified potential V(x; y, rj) (dotted line) and the two tangent 
lines at min(A'j) and max(A'j). It is apparent that the intersection of the two tangents yields a lower 
bound in Bj. Specifically, if we let W{x) be the piecewise-linear function composed of the two tangents, 
then the inequality V{x;y^gj) > y(x;y, r^) > W{x) is satisfied for all x G Ij. 

V. Adaptive Rejection Sampling 

The adaptive rejection sampling (ARS) [[8l| algorithm enables the construction of a sequence of proposal 
densities, {7rt(x)}^^^, and bounds tailored to the target density. Its most appealing feature is that each 
time we draw a sample from a proposal nt and it is rejected, we can use this sample to build an improved 
proposal, TT^+i, with a higher mean acceptance rate. 

Unfortunately, this attractive ARS method can only be applied with target pdf 's which are log-concave 
(hence, unimodal), which is a very stringent constraint for may practical applications. Next, we briefly 
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review the ARS algorithm and then proceed to introduce its extension for non-log-concave and multimodal 
target densities. 

Let p{x\y) denote the target pdi[^ The ARS procedure can be applied when log[p(x|y)] is concave, 
i.e., when the potential function V{x; y, g) = — log[p(x|y)] is strictly convex. Let St = {^i, 52, ... , s^^} 
be a set of support points in the domain D of V^(x;y, g). From St we build a piecewise-linear lower 
hull of V{x; y, g), denoted Wt{x), formed from segments of linear functions tangent to V{x; y, g) at the 
support points in St. Figure |3] (center) illustrates the construction of Wt{x) with three support points for 
a generic log-concave potential function V^(x;y, g). 

Once Wt{x) is built, we can use it to obtain an exponential-family proposal density 

7Tt{x) = ctexp[-Wt{x)i (29) 

where ct is the proportionality constant. Therefore 7Tt{x) is piecewise-exponential and very easy to sample 
from. Since Wt{x) < T^(x;y, g), we trivially obtain that ^vr(x) > p{x\y) and we can apply the RS 
principle. 

When a sample x^ from 7rt{x) is rejected we can incorporate it into the set of support points, 
5^+1 = St U {x'} (and kt^i = kt + 1). Then we compute a refined lower hull, Wt^i{x), and a new 



proposal density 7Tt^i{x) = q+i exp{— W^+i(x)}. Table III summarizes the ARS algorithm. 



TABLE III 

Adaptive Rejection Sampling Algorithm. 



1. Start with t = 0, <So = {si, S2} where si < S2, and the derivatives of V{x,y,g) in si, S2 ^ D having different signs. 

2. Build the piecewise-linear function Wt{x) as shown in Figure jsj (center), using the tangent lines to V{x;y,g) 
at the support points St. 

3. Sample x' from 7rt{x) oc exp{— Wt(x)}, and u from U{[0, 1]). 

4. If u < expf-vJjcxO] ^^^^P^ ^' <St+i = St, kt+1 = kt. 

5. Otherwise, if u' > expf-vJf(ccO] ' ^^J^^^ St-\-i = <St U {x'} and update kt-^i = /ct + 1. 

6. Sort St+i in ascending order, increment t and go back to step 2. 



VI. Generalization of the ARS Method 

In this section we introduce a generalization of the standard ARS scheme that can cope with a broader 
class of target pdf's, including many multimodal distributions. The standard algorithm of |8|, described 

^The method does not require that the target density be a posterior pdf, but we prefer to keep the same notation as in the 
previous section for coherence. 
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in Table IllTl is a special case of the method described below. 



A. Generalized adaptive rejection sampling 

We wish to draw samples from the posterior p{x\y). For this purpose, we assume that 

• all marginal potential functions, Vi(i9i), z = 1, . . . , n, are strictly convex, 

• the prior pdf has the form p{x) oc exp{— V^+i(a^ — x)}, where V^+i is also a convex marginal 
potential with its mode located at /i, and 

• the nonlinearities gi{x) are either convex or concave, not necessarily monotonic. 

We incorporate the information of the prior by defining an extended observation vector, y = 
[yi, . . . , yn, Vn+i = m]^, and an extended vector of nonlinearities, g(x) = [gi{x), . . . , gn{x) , gn+i{x) = 
x]^ . As a result, we introduce the extended system potential function 

V{x; y, g) ^ V{x; y, g) + Vn+i (/x - x) = - log[p(x|y)] + cq, (30) 

where cq accounts for the superposition of constant terms that do not depend on x. We remark that the 
function V^(x;y, g) constructed in this way is not necessarily convex. It can present several minima and, 
as a consequence, p{x\y) can present several maxima. 

Our technique is adaptive, i.e., it is aimed at the construction of a sequence of proposals, denoted 
7Tt{x), t G N, but relies on the same basic arguments already exploited to devise the BMl. To be specific, 
at the t-th iteration of the algorithm we seek to replace the nonlinearities {gi}^^i by piecewise-linear 
functions {ri^t}i'=i in such a way that the inequalities 

\yi - n,t(^)| < \yi - gi{x)\ and (31) 

iVi - ri^t{x)){yi - gi{x))>{) (32) 



are satisfied Vx G M. Therefore, we repeat the same conditions as in Eqs. ([T2|)-([T3[) but the derivation of 
the generalized ARS (GARS) algorithm does not require the partition of the Sol space, as it was needed 
for the BMl. 

We will show that it is possible to construct adequate piecewise-linear functions of the form 

J max[n,i(x),...,n,K^(x)], if is convex 
ri^t{x) = < _ _ (33) 

y min[ri,i(x), . . . , ri^xXx)], if gi is concave 

where i = 1, . . . , n and each fij{x), j = 1, . . . , K^, is a purely linear function. The number of linear 

functions involved in the construction of n,t(x) at the t-th iteration of the algorithm, denoted Kt, 

determines how tightly 7Tt{x) approximates the true density p{x\y) and, therefore, the higher Kt, the 
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higher expected acceptance rate of the sampler. In Section |VI-B| below, we explicitly describe how to 



choose the linearities fij{x), j = 1, . . . in order to ensure that (31) and (32) hold. We will also 
show that, when a proposed sample is rejected, Kf can be increased (Kf^i = + 1) to improve the 
acceptance rate. 

Let = [ri^t(x), . . . , ^(x), r^+i^t(x) = x]^ be the extended vector of piecewise-linear functions. 



that yields the modified potential y(x;y, f^). The same argument used in Section IV- A to derive 
the BMl shows that, if ^ and ([32]) hold, then y(x;y,ft) < V{x;y,g), Vx G M. Finally, we 
build a piecewise-linear lower hull Wt{x) for the modified potential, as explained below, to obtain 

Wt{x)<V{x;y,rt)<V{x;y,g). 



The definition of the piecewise-linear function ri^t{x) in (33) can be rewritten in another form 

r^t{x) = ri^j{x) for x G [a, 6] 



(34) 



where a is the abscissa of the intersection between the linear functions j_i(x) and fij(x), and b is 
the abscissa of the intersection between fij{x) and Therefore, we can define the set of all 

abscissas of intersection points 



£t = {ueR: rij{u) = rij^i{u) for z = 1, . . . , n + 1, j = 1, . . . , - 1}, 
and sort them in ascending order 

Ui <U2 < . <UQ 

where Q is the total number of intersections. Then 



(35) 



(36) 



a) since we have assumed that the marginal potentials are convex, we can use Eq. p4} and the argument 
to show that the modified function V{x; y, ft) is convex in each interval [uq, ^g+i], 



of Section 



IV-E 



with g = 1, . . . , Q, and, 

b) as a consequence, we can to build Wt{x) by taking the linear functions tangent to y(x;y, f^) at 
every intersection point i/g, g = 1, . . . , Q. 
Fig. [3] (right) depicts the relationship among V^(x;y, g), y(x;y, f^) and Wt{x). Since Wt{x) is 
piecewise linear, the corresponding pdf 7Tt{x) oc exp{—Wt{x)} is piecewise exponential and can be 
easily used in a rejection sampler (we remark that Wt{x) < V^(x;y, g), hence 7Tt{x) oc exp{—Wt{x)} > 
exp{-y(x;y,g)} (xp(x|y)). 

Next subsection is devoted to the derivation of the linear functions needed to construct f^. Then, we 
describe how the algorithm is iterated to obtain a sequence of improved proposal densities and provide 
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a pseudo-code. Finally, we describe a limitation of the procedure, that yields improper proposals in a 
specific scenario. 




Fig. 3. Left: The intersection of the tangents to y(x;y, r^) (dashed line) at mm{Xj) and max(A'j) is a lower bound for 
^(^^y^gj) (solid line). Moreover, note that the resulting piecewise-linear function W{x) satisfies the inequality V{x;y,gj) > 
y(x;y, Fj) > W{x), for all x G 2j . Center: Example of construction of the piecewise-linear function Wt{x) with 3 support 
points St = {si, S2, S3}, as carried out in the ARS technique. The function Wt{x) is formed from segments of linear functions 
tangent to V{x; y, g) at the support points in St. Right: Construction of the piecewise linear function Wt{x) as tangent lines to 
the modified potential y(x;y, ft) at three intersections points ui, U2 and u^, as carried out in the ARS technique. 



B. Construction of linear functions rij(x) 

A basic element in the description of the GARS algorithm in the previous section is the construction 
of the linear functions fij{x). This issue is addressed below. For clarity, we consider two cases 
corresponding to non-monotonic and monotonic nonlinearities, respectively. It is important to remark 
that the nonlinearities ^^(x), z = 1, . . . , n (remember that gn+i{x) = x is linear), can belong to different 
cases. 

1) Non-monotonic nonlinearities: Assume gi{x) is a non-monotonic, either concave or convex, 
function. We have three possible scenarios depending on the number of simple estimates for gi{x): 
(a) there exist two simple estimates, Xi^i < Xi^2, (b) there exists a single estimate, Xi^i = Xi^2, or (c) there 
is no solution for the equation yi = gi{x). 

Let us assume that Xi^i < Xi^2 and denote Ji = [xi^i^Xi^2]' Let us also introduce a set of support 
points St = {51, . . . , that contains at least the simple estimates and an arbitrary point s G Ji, i.e., 
^i,i5^i,2 ^ Sf. The number of support points, kt, determines the accuracy of the approximation of the 
nonlinearity gi{x) that can be achieved with the piecewise-linear function ri^t{x). In Section VI-C we 



show how this number increases as the GARS algorithm iterates. Now, we assume it is given and fixed. 

Figure [4] illustrates the construction of fij{x), j = 1, . . . , where Kt = h — I, and ri^t{x) for a 
convex nonlinearity gi{x) (the procedure is completely analogous for concave gi{x)). Assume that the 
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two simple estimates Xi^i < Xi^2 exist, hence \Ji\ > 0. For each j G {1, . . . , /c^}, the linear function 
fij{x) is constructed in one out of two ways: 

(a) if [sj^Sj^i] C J^, then fij{x) connects the points {sj^gi{sj)) and (sj+i, ^^(sj+i)), else 

(b) if Sj ^ Ji, then fij{x) is tangent to gi{x) at x = sj. 

From Fig. [4] (left and center) it is apparent that ri^t{x) = max[fi^i(x), . . . , built in this way 

satisfies the inequalities (31) and ( [32| , as required. For concave gi{x), (31) and ( [32| are satisfied if we 
choose ri^t(x) = min[fi^i(x), . . . , fi^x^(x)]^. 

When |J7^| = (i.e., Xi^i = Xi^2 or there is no solution for the equation yi = gi{x)), then each fij{x) 
is tangent to ^i(x) at x = 5j, Vsj G 5t, and in order to satisfy (31) and (32), we need to select 



ri^t{x) = 
as illustrated in Fig. |4] (right). 



max[n^i(x), . . . , ri^Kt(x), yi], if is convex 
min[fi,i(x), . . . , fi^Kt{x),yi], if gi is concave 



(37) 




Fig. 4. Construction of the piecewise linear function ri,t{x) for non-monotonic functions. The straight lines fij{x) 
form a piecewise linear function that is closer to the observation value yi (dashed line) than the nonlinearity gi{x), i.e., 
\yi — ri,t{x)\ < \yi — gi{x)\. Moreover, ri,t{x) and gi{x) are either simultaneously greater than (or equal to) yi, or 
simultaneously lesser than (or equal to) yi, i.e., {yi — ri^t{x)){yi — gi{x)) > 0. Therefore, the inequalities ([sTJ and ( [32] ) 
are satisfied. The point (sj, gi{sj)), corresponding to support point sj, is represented either by a square or a circle, depending 
on whether it is a simple estimate or not, respectively. Left: construction of ri,t{x) with kt =^ 4 support points when the 
nonlinearity gi{x) is convex, therefore ri,t{x) = max[fi,i(x), . . . ,fi,3(x)] (Kt = /ct — 1 = 3). We use the tangent to gi{x) 
at X = S4 because ^ Ji — [51,53], where s\ — Xi^i and S3 = Xi,2 are the simple estimates (represented with squares). 
Center: since the nonlinearity gi{x) is concave, ri,t(x) = min[fi,i(x), . . . ,f^,3(x)]. We use the tangent to gi{x) at S4 because 
s\ ^ Ji — [s2, S4], where S2 = and S4 = Xi,2 are the simple estimates (represented with squares). Right: construction of 
the ri,t(x), with two support points, when there are not simple estimates. We use the tangent lines, but we need a correction 
in the definition of ri^t(x) in order to satisfy the inequalities |3TJ and |32] ). Since gi(x) in the figure is convex, we take 
ri,t(x) = max[fi,i(x), . . . , fi,^^ (x), 2/^]. 
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2) Monotonic nonlinearities: In this case gi{x) is invertible and there are two possibilities: there exists 
a single estimate, Xi = g^^{yi), or there is no solution for the equation i/i = gi{x) (where i/i does not 



belong to the range of gi{x)). Similarly to the construction in Section IV-A we distinguish two cases 



(a) if 



dgi{x) (Pgi{x) 



dx 



dx^ 



> 0, then we define Ji = (— oo, x^], and 



(b) if ^ X < 0, then we define J, ^ [x„ +oc). 

The set of support points is St = {^i, . . . ^s^^}, with si < S2 • • • < s^^, and includes at least the simple 
estimate Xi and an arbitrary point s e Ji, i.e., Xi^s G St. 

The procedure to build fij{x), for j = 1, . . . , Kt, with Kt = kt, is similar to Section 



VI-Bl 



Consider 



case (a) first. For each j G {2, . . . , kt}, if [sj-i, sj] C Ji = (— oo, Xi], then fij{x) is the linear function 
that connects the points {sj-i^ gi{sj-i)) and {sj^gi{sj)). Otherwise, if Sj ^ Ji ^ (— oo,Xi], fij{x) is 
tangent to gi{x) at x = 5j. Finally, we set = ^^(si) for all x G M. The piecewise linear function 

Ti^t is ri^t{x) = max[f^^i(x), . . . ,fi,Xt(^)]- This construction is depicted in Fig. |5] (left). 

Case (b) is similar. For each j G {1, . . . , fc^}, if [sj, d Ji ^ [x^, +oo), then fij{x) is the linear 
function that connects the points {sj,gi{sj)) and (5^+1,^^(5^+1)). Otherwise, if sj ^ 2^ = [xi,+(X)), 
fij{x) is tangent to ^i(x) at x = 5j. Finally, we set fi^kA^) = 9i{^kt) (remember that, in this case, 
Kt = fct), for all X G M. The piecewise linear function ri^t will be ri^t{x) = min[fi^i(x), . . . ,f^,K,(x)]. 
This construction is depicted in Fig. |5] (right). 



It is straightforward to check that the inequalities p\\ and ( |32| ) are satisfied. Note that, in this case, the 
number of linear functions j (x) coincides with the number of support points. If there is not solution 



for the equation yi = gi{x) (yi does not belong to the range of gi{x)), then (31) and (32) are satisfied if 



we use (37) to build ri^t{x). 



C. Summary 



We can combine the elements described in Sections |VI-B1| and |VI-B2| into an adaptive algorithm that 
improves the proposal density Tit{x) oc exp{— VI/t(x)} each time a sample is rejected. 

Let St denote the set of support points after the t-th iteration. We initialize the algorithm with 
5o = {5j})li such that 

• all simple estimates are contained in 5o, and 

• for each interval = with non-zero length {\Ji\ > 0), there is at least one (arbitrary) 
support point contained in Ji. 



The proposed GARS algorithm is described in Table [iVj Note that every time a sample x' drawn from 
Tit{x) is rejected, x' is incorporated as a support point in the new set 5^+1 = 5t U {x^} and, as a 
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Fig. 5. Examples of construction of the piecewise-linear function ri,t{x) with kt = 3 support points sj, for the two 
subcases. It is apparent that \yi — ri,t(x)| < \yi — gi{x)\ and that ri^t{x) and gi{x) are either simultaneously greater than 
(or equal to) yi, or simultaneously lesser than (or equal to) yi, i.e., {yi — ri,t{x)){yi — gi{x)) > 0. The simple estimates 
are represented by squares while all other support points are drawn as circles. Left: the figure corresponds to the subcase 
1 where r^,^(x) = max[fi,i(x), . . . , fi,3(x)] (Kt — kt — 3). Right: the figure corresponds to to the subcase 2 where 
n,t(x) = min[n,i(x), . . . ,n,3(x)] {Kt = h = 3). 



consequence, a refined lower hull Wt^i{x) is constructed yielding a better approximation of the system 
potential function. In this way, 7Tt^i{x) oc exp{—Wt^i{x)} becomes closer to p{x\y) and it can be 
expected that the acceptance rate be higher. This is specifically shown in the simulation example in 
Section IvTlHI 

TABLE IV 

Steps of Generalized Adaptive Rejection Sampling. 



1. Start with t = set <So = 




2. Build fij (x) for i = 1, . . . , n + 1, j = 1, • • • , Kt, where Kt = kt — 1 or Kt = kt 


depending on whether gi{x) is 


non-monotonic or monotonic, respectively. 




3. Calculate the set of intersection points 8t = {u : fij{u) = fij-^i{u) for i = 


l,...,n + l, j = 


Let Q = \St\ be the number of elements in Sf 




4. Build Wt{x) using the tangent lines to V{x; y, ft) at the points Uq ^ 8t, q = I, . . . 




5. Draw a sample x' from 7Tt{x) oc exp[— Wt(x)]. 




6. Sample u from ^([0, 1]). 




7. If U' < expf-vJjca:')] ^^^^P^ + l = 




8. Otherwise, if u > expf-vJj(a;0] ^^J^^^ update St-\-i = <St U {x^}. 




9. Sort <St+i in ascending order, set t = t + 1 and go back to step 2. 
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D. Improper proposals 



The GARS algorithm as described in Table IV breaks down when every ^^(x), z = 1, . . . , n + 1, is 
nonlinear and convex (or concave) monotonic. In this case, the proposed construction procedure yields 
a piecewise lower hull Wt{x) which is positive and constant in an interval of infinite length. Thus, the 
resulting proposal, Tit{x) oc exp{— is improper (J^^ 7Tt{x)dx +oo) and cannot be used for 
RS. One practical solution is to substitute the constant piece of Wt{x) by a linear function with a small 
slope. In that case, 7rt{x) is proper but we cannot guarantee that the samples drawn using the GARS 
algorithm come exactly from the target pdf. Under the assumptions in this paper, however, gn+i{x) = x 
is linear (due to our choice of the prior pdf), and this is enough to guarantee that 7Tt{x) be proper. 

VII. Examples 

A. Example 1: Calculation of upper bounds for the likelihood function 

Let X be a scalar Sol with prior density X ^ p{x) = N{x] 0, 2) and the random observations 

= exp(X) + ei, y2 = exp(-X) + e2, (38) 

where 6i, 62 are independent noise variables. Specifically, 61 is Gaussian noise with X(i9i;0, 1/2) = 
fci exp { — (i9i)^}, and 62 has a gamma pdf, 62 ^ r(i92;^, A) = A:2'i?2~^ exp {— Ai92}, with parameters 
^ = 2,A = 1. 

The marginal potentials are Vi(i9i) = ^\ and ^2(^2) = —log (^2) + ^2- Since the minimum of 
^2(^2) occurs in i?2 = 1, we replace I2 with the shifted observation Y2 = exp(— X) + 63, where 
= y2 - 1, = 82 - 1. Hence, the marginal potential becomes ^2(^2) = " log(^2 + 1) + ^2 + 1' 
with a minimum at ^2 ^ 0' vector of observations is Y = [Yi,y2*]^ ^nd the vector of 
nonlinearities is g(x) = [exp (x), exp (— x)]^. Due to the monotonicity and convexity of gi and 
g2, we can work with a partition of R consisting of just one set, Bi = M. The joint potential is 
1/(2) (^^ ^ ^*) ^ ^2^^ v-(^.) = ^2 _ 1^^^* + 1) + ^* + 1 and the system potential is 

V{x]y,g) = l/(^^(yi - exp {x),yl - exp(-x)) = 

(39) 

= {yi - exp {x)f - \ogiyl - exp (-x) + 1) + {yl - exp (-x)) + 1. 

Assume that, Y = y = [2,5]^. The simple estimates are X = {xi = log(2),X2 = — log(5)}, and, 
therefore, we can restrict the search of the bound to the interval X — [min(A:') = — log(5), max(A:') = 
log(2)] (note that we omit the subscript because we have just one set, Bi = R). Using the BMl technique 



in Section IV-A we find the linear functions ri(x) = 0.78x + 1.45 and r2(x) = — 1.95x + 1.85. 
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In this case, we can analytically minimize the modified system potential, to obtain x = —0.4171 

with R~'^(v) 



argminy(x,y, r). The associated lower bound is 7 = y(x,y, r) = 2.89 (the true global minimum 



IV-D 



of the system potential is 3.78). We can also use the technique in Section 
— log(v^ + 1) + + 1- The lower bound for the quadratic potential is 72 = 2.79 and we can readily 
compute a lower bound 7 = i?~^(72) = 1.68 for V^(x;y, g). Since the marginal potentials are both 
convex, we can also use the procedure described in Section [lV-E[ obtaining the lower bound 7 = 1.61. 

Figure [6] (a) depicts the system potential V^(x;y, g), and the lower bounds obtained with the three 
methods. It is the standard BMl algorithm that yields the best bound. 



In order to improve the bound, we can use the iterative BM2 technique described in Section IV-B 
With only 3 iterations of BM2, and minimizing analytically the modified potential y(x,y, r), we find 
a very tight lower bound 7 = min(y(x, y, r)) = 3.77 (recall that the optimal bound is 3.78). Table [V 
summarizes the bounds computed with the different techniques. 

Next, we implement a rejection sampler, using the prior pdf p{x) = A^(x;0,2) oc exp{— as a 
proposal function and the upper bound for the likelihood L = exp{— 3.77}. The posterior density has 
the form 

p{x\y) (X p{y\x)p{x) = exp{-y(x; y, g) - x'^/A}. (40) 

Figure [6] (b) shows the normalized histogram of = 10, 000 samples generated by the RS algorithm, 
together with the true target pdf p{x\y) depicted as a dashed line. The histogram follows closely the shape 
of the true posterior pdf. Figure [6] (c) shows the acceptance rates (averaged over 10, 000 simulations) as 
a function of the bound 7. We start with the trivial lower bound 7 = and increase it progressively, up 
to the global minimum 7 = 3.78. The resulting acceptance rates are 1.1% for the trivial bound 7 = 0, 
18% with 7 = 2.89 (BMl) and approximately 40% with 7 = 3.77 (BM2). Note that the acceptance rate 
is ^41% for the optimal bound and we cannot improve it any further. This is an intrinsic drawback of 
a rejection sampler with constant bound L and the principal argument that suggests the use of adaptive 
procedures. 

TABLE V 

Lower bounds of the system potential function. 



Method 


BMl 


BMl + trasformation R 


BMl + tangent lines 


BM2 


Optimal Bound 


Lower Bound 7 


2.89 


1.68 


1.61 


3.77 


3.78 
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(a) (b) (c) 




X X lower bound 



Fig. 6. (a) The system potential F (x, y, g) (solid), the modified system potential y, r) (dashed), function (i?~^ o V2) (x, y, r) 
(dot-dashed) and the piecewise-linear function W{x) formed by the two tangent lines to y(x,y, r) at min(A') and max(A') 
(dotted). The corresponding bounds are marked with dark circles, (b) The target density p{x\y) oc p{y\x)p{x) (dashed) and the 
normalized histogram of = 10, 000 samples using RS with the the calculated bound L. (c) The curve of acceptance rates 
(averaged over 10, 000 simulations) as a function of the lower bound 7. The acceptance rate is 1.1% for the trivial bound 7 = 0, 
18% with 7 = 2.89, approximately 40% with 7 = 3.77 and 41% with the optimal bound 7 = 3.78. 



B. Example 2: Comparison of ARMS and GARS techniques 

Consider the problem of sampling a scalar random variable X from a posterior bimodal density 
p{x\y) oc p{y\x)p{x), where the likelihood function is p{y\x) oc exp{— cosh(?/ — x^)} (note that we 
have a single observation Y = yi) and prior pdf is p{x) oc exp{— 0^(77 — exp(|x|))^}, with constant 
parameters a > and 77. Therefore, the posterior pdf is p{x\y) oc exp {—V{x] y, g)}, where y = 77]^, 
g(x) = [gi{x) ^ g2{x)Y = [x^, exp(|x|)]^ and the extended system potential function becomes 

V{x]y,g) = cosh(^ - x^) + a{r] - exp(|x|))2. (41) 

The marginal potentials are Vi{'di) = cosh(79i) and 1^2(^2) = <^^2- Note that the density p{x\y) is an 
even function, p{x\y) = p{—x\y), hence it has a zero mean, = J xp{x\y)dx = 0. The constant a 
is a scale parameter that allows to control the variance of the random variable X, both a priori and a 
posteriori. The higher the value of a, the more skewed the modes of p{x\y) become. 

There are no standard methods to sample directly from p{x\y). Moreover, since the posterior density 
p{x\y) is bimodal, the system potential is non-log-concave and the ARS technique cannot be applied. 
However, we can easily use the GARS technique. If, e.g., Y = y = [y = 5,77 = 10]^ the simple 
estimates corresponding to gi{x) are xi^i = —a/5 and xi^2 = a/5, so that Ji = [—a/5, a/5]. In the same 
way, the simple estimates corresponding to g2{x) are X2,i = — log(lO) and ^2,2 = log(lO), therefore 
J2 = [-log(10),log(10)]. 

An alternative possibility to draw from this density is to use the ARMS method LHJ . Therefore, in 
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this section we compare the two algorithms. Specifically, we look into the accuracy in the approximation 
of the posterior mean = by way of the sample mean estimate, A = ;^ J2iLi different values 

of the scale parameter a. 

In particular, we have considered ten equally spacial values of a in the interval [0.2, 5] and then 
performed 10, 000 independent simulations for each value of a, each simulation consisting of drawing 
5, 000 samples with the GARS method and the ARMS algorithm. Both techniques can be sensitive to their 
initialization. The ARMS technique starts with 5 points selected randomly in [—3.5,3.5] (with uniform 
distribution). The GARS starts with the set of support points Sq = {a:^2,i, <§, xi^2, ^2,2} sorted in 
ascending order, including all simple estimates and an arbitrary point s needed to enable the construction 
Point s is randomly chosen in each simulation, with uniform pdf in Ji = [xi^i,xi^2]- 



in Section 



VI-B 



The simulation results show that the two techniques attain similar performance when a G [0.2, 1] (the 
modes of p{x\y) are relatively flat). When a G [1,4] the modes become more skewed and Markov chain 
generated by the ARMS algorithm remains trapped at one of the two modes in ^ 10% of the simulations. 
When a G [4, 5] the same problem occurs in ^ 25% of the simulations. The performance of the GARS 
algorithm, on the other hand, is comparatively insensitive to the value of a. 



Figure n (a) shows the posterior density p{x\y) oc exp {— cosh(?/i 



X 



Q:(/i — exp(|x|))^} with 



a = 0.2 depicted as a dashed line, and the normalized histogram obtained with the GARS technique. 
Figure [7] (b) illustrates the acceptance rates (averaged over 10, 000 simulations) for the first 20 accepted 
samples drawn with the GARS algorithm. Every time a sample x^ drawn from 7rt{x) is rejected, it is 
incorporated as a support point. Then, the proposal pdf 7Tt{x) becomes closer to target pdf p{x\y) and, as 
a consequence, the acceptance rate becomes higher. For instance, the acceptance rate for the first sample 
is ^ 16%, but for the second sample, it is already ^ 53%. The acceptance rate for the 20-th sample is 
^ 90%. 

TABLE VI 

Estimated posterior mean, fi (for a = 5). 



Simulation 


1 


2 


3 


4 


5 


ARMS 


-2.2981 


0.0267 


0.0635 


0.0531 


2.2994 


GARS 


0.0772 


-0.0143 


0.0029 


0.0319 


0.0709 
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(a) (b) 




X number of accepted samples 



Fig. 7. (a) The bimodal density p{x\y) oc exp { — V{x; y, g)} (dashed Une) and the normahzed histogram of = 5000 samples 
obtained using GARS algorithm, (b) The curve of acceptance rates (averaged over 10, 000 simulations) as a function of the 
accepted samples. 



C. Example 3: Target localization with a sensor network 

In order to show how the proposed techniques can be used to draw samples from a multivariate 
(non-scalar) Sol, we consider the problem of positioning a target in a 2-dimensional space using range 
measurements. This is a problem that appears frequently in localization applications using sensor networks 

m. 

We use a random vector X = [Xi,X2]^ to denote the target position in the plane R^. The prior 
density of X is p(xi,X2) = p(xi)p(x2), where p{xi) = A^(xi; 0,1/2) = /c exp { — (x^)^}, z = 1,2, 
i.e., the coordinate Xi and X2 are i.i.d. Gaussian. The range measurements are obtained from two 
sensor located at hi = [0, 0]^ and h2 = [2, 2]^, respectively. The effective observations are the (square) 
Euclidean distances from the target to the sensors, contaminated with Gaussian noise, i.e., 

(42) 

y2 = (Xi-2)2 + (X2-2)2 + e2, 

where 6^, i = 1,2, are independent Gaussian variables with identical pdf's, A^(i9i; 0, 1/2) = 
fci exp {— Therefore, the marginal potentials are quadratic, Vi(^i) = z = 1,2. The random 
observation vector is denoted Y = [li,l2]^- We note that one needs three range measurements to 
uniquely determine the position of a target in the plane, so the posterior pdf p{x\y) oc p{y\x)p{x) is 
bimodal. 

We apply the Gibbs sampler to draw N particles x^^) = ^2 ^]^, i = 1, . . . , A^, from the posterior 
density p{x\y) oc p(y|xi, X2)p(xi)p(x2). The algorithm can be summarized as follows: 

1) Set i = 1, and draw ^2^^ from the prior pdf p{x2). 

2) Draw a sample x^^^ from the conditional pdf p(xi|y, ^2^^), and set x^^^ = [ X ^ Xi^ '\ 
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3) Draw a sample x^^^"^ from the conditional pdf p(x2|y, x^^^). 

4) Increment i = i-\-l.lfi>N stop, else go back to step 2. 

The Markov chain generated by the Gibbs sampler converges to a stationary distribution with pdf 

p(xi,x2|y). 

In order to use Gibbs sampling, we have to be able to draw from the conditional densities p{xi\y, x^^) 
and p{x2\y^x^l^). In general, these two conditional pdf's can be non-log-concave and can have several 
modes. Specifically, the density p{xi\y^X2^) oc p{y\xi^X2^)p{xi) can be expressed as p{xi\y^X2^) 
exp{-y(xi;yi,gi)} where y^ = [yi - (4'^)^y2 - {xf - 2)^0]^, g^{x) = [x^, (x - 2)2,x]^ and 



oc 



n^i;yi,gi)= yi-{xff-xl + y2 - (4'' - 2)^ - (Xi - 2 



2 



2 



+ X?, (43) 



while the pdf p{x2\y,xf') ccp{y\x2,x^^^)p{x2) can be expressed as p(x2|y, oc exp{— y(x2; y2, §2)} 
where = [yi - {xf)\ - {x^ - 2)^, 0]^, l^i^) = [x^, (x - 2)^, x]^ and 



^(a;2;y2,g2) 



yx-{xff-xl + y2-(xr-2)^-(x2-2 



2 



(^) 



+ ^2- (44) 



Since the marginal potentials and the nonlinearities are convex, we can use the GARS technique to sample 
the conditional pdf's. 

We have generated = 10, 000 samples from the Markov chain, with fixed observations = 5 
and y2 = 2. The average acceptance rate of the GARS algorithm was ^ 30% both for p{x\\y^X2) and 
p(x2|y, x\). Note that this rate is indeed as a average because, at each step of the chain, the target pdf's 
are different (if, e.g., x^l^ ^ x^^~^^ then p(x2|y, x^^^) ^ p{x2\y^x^^~^^)). 

Figure [S] (a) shows the shape of the true target density p(xi,X2|y), while Figure [I] (b) depicts the 
normalized histogram with N = 10, 000 samples. We observe that it approximates closely the shape of 
target pdf. 

Finally, it is illustrative to consider the computational savings attained by using the GARS method 
when compared with a rejection sampler with a fixed bound. Specifically, we have run again the Gibbs 
sampler to generate a chain of 10,000 samples but, when drawing from p(xi|y, X2) and p{x2\y^xi), 
we have used RS with prior proposals (p(xi) and p{x2), respectively) and a fixed bound computed 
(analytically) with the method in Section |IV-C| for quadratic potentials. The average acceptance rate for 
the rejection sampler was ^ 4% and the time needed to generate the chain was approximately 10 times 
the time needed in the simulation with the GARS algorithm. 
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(a) (b) 




Fig. 8. (a) The target density p(x|y) = X2|y) oc p{y\xi , X2)p{xi)p{x2) . (b) The nonnaUzed histogram with N = 10, 000 
samples, using the GARS algorithm within a Gibbs sampler. 

VIII. Conclusions 

We have proposed families of generalized rejection sampling schemes that are particularly, but not 
only, useful for efficiently drawing independent samples from a posteriori probability distributions. The 
problem of drawing from posterior distributions appears very often in signal processing, e.g., see the 
target localization example in this paper or virtually any application that involves the estimation of a 
physical magnitude given a set of observations collected by a sensor network. We have introduced two 
classes of schemes. The procedures in the first class are aimed at the computation of upper bounds for 
the likelihood function of the signal of interest given the set of available observations. They provide the 
means to (quickly and easily) design sampling schemes for posterior densities using the prior pdf as a 
proposal function. Then, we have elaborated on the bound-calculation procedures to devise a generalized 
adaptive rejection sampling (GARS) algorithm. The latter is a method to construct a sequence of proposal 
pdf's that converge towards the target density and, therefore, can attain very high acceptance rates. It 
should be noted that the method introduced in this paper includes the classical adaptive rejection sampling 
scheme of |[8l as a particular case. We have provided some simple numerical examples to illustrate the 
use of the proposed techniques, including sampling from multimodal distributions (both with fixed and 
adaptive proposal functions) and an example of target localization using range measurements. The latter 
problem is often encountered in positioning applications of sensor networks. 
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Appendix 



Proposition: The state estimators xj G arg max ^(x|y,g) = arg min V^(x;y,g) belong to the interval 

xelBj] xelBj] 

Xj, i.e., 

Xj G Xj = [min (Afj), max (Afj)], (45) 

where Xj = {xij, . . . , Xnj} is the set of all simple estimates in Bj and Xj C Bj. 
Proof: We have to prove that the derivative of the system potential function is 

^ < 0, for all X < min (Xj) {x G [Bj]), (46) 

and 

^ > 0, for all x > max (Xj) {x G [Bj]), (47) 

so that all stationary points of V stay inside Xj = [min (Afj), max (Af^)]. Routine calculations yield the 
derivative 

rIV dm [rIV.l 

(48) 



and we aim to evaluate it outside the interval Xj. To do it, let us denote Xmin = min(A'j) and 
Xmax = max(A'j) and consider the cases ^ > and ^ < separately (recall that we have assumed 
the sign of ^ to remain constant in Bj). 

When ^ > and since, for every simple estimate, Xij > Xmin^ we obtain that yi = gi{xij) > 
9i{xmin) > 9i{x) Vx < Xmin- Then yi - gi{x) > 0, for all x < Xmin, and, due to properties (PI) 



and (P2) of marginal potential functions, ^ > for all i. As a consequence, ^ < 

\/x <C XfYiifi, X G 

When ^ < and Xij > Xmin. we obtain that yi = gi{xi^j) < gi{xmin) < 9i{x). Vx < Xmin- Then 
Vi — 9i{x) < for all x < Xmin and ^ ^ < 0, again because of (PI) and (P2). As a 



dV 



^^i=yi-9i(x)<0 



consequence, ^ < Vx < Xmin^ x G [Bj]. 

dx 



A similar argument for x > Xmax yields ^ > for all x > Xmax and completes the proof. □ 
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